High Performance LDA through Collective Model Communication Optimization

نویسندگان

Bingjing Zhang

Bo Peng

Judy Qiu

چکیده

LDA is a widely used machine learning technique for big data analysis. The application includes an inference algorithm that iteratively updates a model until it converges. A major challenge is the scaling issue in parallelization owing to the fact that the model size is huge and parallel workers need to communicate the model continually. We identify three important features of the model in parallel LDA computation: 1. The volume of model parameters required for local computation is high; 2. The time complexity of local computation is proportional to the required model size; 3. The model size shrinks as it converges. By investigating collective and asynchronous methods for model communication in different tools, we discover that optimized collective communication can improve the model update speed, thus allowing the model to converge faster. The performance improvement derives not only from accelerated communication but also from reduced iteration computation time as the model size shrinks during the model convergence. To foster faster model convergence, we design new collective communication abstractions and implement two Harp-LDA applicatons, “lgs” and “rtt”. We compare our new approach with Yahoo! LDA and Petuum LDA, two leading implementations favoring asynchronous communication methods in the field, on a 100-node, 4000-thread Intel Haswell cluster. The experiments show that “lgs” can reach higher model likelihood with shorter or similar execution time compared with Yahoo! LDA, while “rtt” can run up to 3.9 times faster compared with Petuum LDA when achieving similar model likelihood.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

HarpLDA+: Optimizing latent dirichlet allocation for parallel efficiency

Latent Dirichlet Allocation (LDA) is a widely used machine learning technique in topic modeling and data analysis. Training large LDA models on big datasets involves dynamic and irregular computation patterns and is a major challenge to both algorithm optimization and system design. In this paper, we present a comprehensive benchmarking of our novel synchronized LDA training system HarpLDA+ bas...

متن کامل

A Survey of Methods for Collective Communication Optimization and Tuning

New developments in HPC technology in terms of increasing computing power on multi/many core processors, high bandwidth memory/IO subsystems and communication interconnects, pose a direct impact on software and runtime system development. These advancements have become useful in producing high-performance collective communication interfaces that integrate efficiently on a wide variety of platfo...

متن کامل

Optimization of Collective Communications in HeteroMPI

HeteroMPI is an extension of MPI designed for high performance computing on heterogeneous networks of computers. The recent new feature of HeteroMPI is the optimized version of collective communications. The optimization is based on a novel performance communication model of switch-based computational clusters. In particular, the model reflects significant non-deterministic and non-linear escal...

متن کامل

Performance Characterisation of Intra-Cluster Collective Communications

Although recent works try to improve collective communication in grid systems by separating intra and intercluster communication, the optimisation of communications focus only on inter-cluster communications.We believe, instead, that the overall performance of the application may be improved if intra-cluster collective communications performance is known in advance. Hence, it is important to ha...

متن کامل

OPTIMIZATION-BASED MONITORING-SUPPORTED CALIBRATION OF A THERMAL PERFORMANCE SIMULATION MODEL

Building performance simulation is being increasingly deployed beyond the building design phase to support efficient building operation. Specifically, the predictive feature of the simulation-assisted building systems control strategy provides distinct advantages in view of building systems with high latency and inertia. Such advantages can be exploited only if model predictions can be relied u...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

High Performance LDA through Collective Model Communication Optimization

نویسندگان

چکیده

منابع مشابه

HarpLDA+: Optimizing latent dirichlet allocation for parallel efficiency

A Survey of Methods for Collective Communication Optimization and Tuning

Optimization of Collective Communications in HeteroMPI

Performance Characterisation of Intra-Cluster Collective Communications

OPTIMIZATION-BASED MONITORING-SUPPORTED CALIBRATION OF A THERMAL PERFORMANCE SIMULATION MODEL

عنوان ژورنال:

اشتراک گذاری